Goto

Collaborating Authors

 cancer risk


Seven million cancers a year are preventable, says report

BBC News

Seven million people's cancer could be prevented each year, according to the first global analysis. A report by World Health Organization (WHO) scientists estimates 37% of cancers are caused by infections, lifestyle choices and environmental pollutants that could be avoided. This includes cervical cancers caused by human papilloma virus (HPV) infections which vaccination can help prevent, as well as a host of tumours caused by tobacco smoke from cigarettes. The researchers said their report showed there is a powerful opportunity to transform the lives of millions of people. Some cancers are inevitable - either because of damage we unavoidably build up in our DNA as we age or because we inherit genes that put us at greater risk of the disease.


Association between nutritional factors, inflammatory biomarkers and cancer types: an analysis of NHANES data using machine learning

Liu, Yuqing, Zhao, Meng, Hu, Guanlan, Zhang, Yuchen

arXiv.org Artificial Intelligence

Background. Diet and inflammation are critical factors influencing cancer risk. However, the combined impact of nutritional status and inflammatory biomarkers on cancer status and type, using machine learning (ML), remains underexplored. Objectives. This study investigates the association between nutritional factors, inflammatory biomarkers, and cancer status, and whether these relationships differ across cancer types using National Health and Nutrition Examination Survey (NHANES) data. Methods. We analyzed 24 macro- and micronutrients, C-reactive protein (CRP), and the advanced lung cancer inflammation index (ALI) in 26,409 NHANES participants (2,120 with cancer). Multivariable logistic regression assessed associations with cancer prevalence. We also examined whether these features differed across the five most common cancer types. To evaluate predictive value, we applied three ML models - Logistic Regression, Random Forest, and XGBoost - on the full feature set. Results. The cohort's mean age was 49.1 years; 34.7% were obese. Comorbidities such as anemia and liver conditions, along with nutritional factors like protein and several vitamins, were key predictors of cancer status. Among the models, Random Forest performed best, achieving an accuracy of 0.72. Conclusions. Higher-quality nutritional intake and lower levels of inflammation may offer protective effects against cancer. These findings highlight the potential of combining nutritional and inflammatory markers with ML to inform cancer prevention strategies.


QuaLLM-Health: An Adaptation of an LLM-Based Framework for Quantitative Data Extraction from Online Health Discussions

Kouzy, Ramez, Attar-Olyaee, Roxanna, Rooney, Michael K., Hassanzadeh, Comron J., Li, Junyi Jessy, Mohamad, Osama

arXiv.org Artificial Intelligence

Health-related discussions on social media like Reddit offer valuable insights, but extracting quantitative data from unstructured text is challenging. In this work, we present an adapted framework from QuaLLM into QuaLLM-Health for extracting clinically relevant quantitative data from Reddit discussions about glucagon-like peptide-1 (GLP-1) receptor agonists using large language models (LLMs). We collected 410k posts and comments from five GLP-1-related communities using the Reddit API in July 2024. After filtering for cancer-related discussions, 2,059 unique entries remained. We developed annotation guidelines to manually extract variables such as cancer survivorship, family cancer history, cancer types mentioned, risk perceptions, and discussions with physicians. Two domain-experts independently annotated a random sample of 100 entries to create a gold-standard dataset. We then employed iterative prompt engineering with OpenAI's "GPT-4o-mini" on the gold-standard dataset to build an optimized pipeline that allowed us to extract variables from the large dataset. The optimized LLM achieved accuracies above 0.85 for all variables, with precision, recall and F1 score macro averaged > 0.90, indicating balanced performance. Stability testing showed a 95% match rate across runs, confirming consistency. Applying the framework to the full dataset enabled efficient extraction of variables necessary for downstream analysis, costing under $3 and completing in approximately one hour. QuaLLM-Health demonstrates that LLMs can effectively and efficiently extract clinically relevant quantitative data from unstructured social media content. Incorporating human expertise and iterative prompt refinement ensures accuracy and reliability. This methodology can be adapted for large-scale analysis of patient-generated data across various health domains, facilitating valuable insights for healthcare research.


Some breast cancer patients could be at risk of another type of cancer, study reveals

FOX News

Victoria Raphael of New York City reveals her success story -- and her decision to freeze her eggs after she was diagnosed with cancer. Women with breast cancer who have received chemotherapy are at an increased risk of developing lung cancer, a new study suggests. Epic Research, a health data group based in Delaware, found that women in this category have a 57% higher lung cancer risk than those who received radiation. In comparison to patients who received endocrine therapy, those who have undergone chemo have a 171% increase in lung cancer risk, the study found. In a statement sent to Fox News Digital, the Epic Research team said the key takeaway from their research is that primary lung cancer is more than twice as prevalent in women who were previously diagnosed with breast cancer -- compared to those who did not have it.


Leveraging Transformers to Improve Breast Cancer Classification and Risk Assessment with Multi-modal and Longitudinal Data

Shen, Yiqiu, Park, Jungkyu, Yeung, Frank, Goldberg, Eliana, Heacock, Laura, Shamout, Farah, Geras, Krzysztof J.

arXiv.org Artificial Intelligence

Breast cancer screening, primarily conducted through mammography, is often supplemented with ultrasound for women with dense breast tissue. However, existing deep learning models analyze each modality independently, missing opportunities to integrate information across imaging modalities and time. In this study, we present Multi-modal Transformer (MMT), a neural network that utilizes mammography and ultrasound synergistically, to identify patients who currently have cancer and estimate the risk of future cancer for patients who are currently cancer-free. MMT aggregates multi-modal data through self-attention and tracks temporal tissue changes by comparing current exams to prior imaging. Trained on 1.3 million exams, MMT achieves an AUROC of 0.943 in detecting existing cancers, surpassing strong uni-modal baselines. For 5-year risk prediction, MMT attains an AUROC of 0.826, outperforming prior mammography-based risk models. Our research highlights the value of multi-modal and longitudinal imaging in cancer diagnosis and risk stratification.


Combining Survival Analysis and Machine Learning for Mass Cancer Risk Prediction using EHR data

Philonenko, Petr, Kokh, Vladimir, Blinov, Pavel

arXiv.org Artificial Intelligence

Purely medical cancer screening methods are often costly, time-consuming, and weakly applicable on a large scale. Advanced Artificial Intelligence (AI) methods greatly help cancer detection but require specific or deep medical data. These aspects affect the mass implementation of cancer screening methods. For these reasons, it is a disruptive change for healthcare to apply AI methods for mass personalized assessment of the cancer risk among patients based on the existing Electronic Health Records (EHR) volume. This paper presents a novel method for mass cancer risk prediction using EHR data. Among other methods, our one stands out by the minimum data greedy policy, requiring only a history of medical service codes and diagnoses from EHR. We formulate the problem as a binary classification. This dataset contains 175 441 de-identified patients (2 861 diagnosed with cancer). As a baseline, we implement a solution based on a recurrent neural network (RNN). We propose a method that combines machine learning and survival analysis since these approaches are less computationally heavy, can be combined into an ensemble (the Survival Ensemble), and can be reproduced in most medical institutions. We test the Survival Ensemble in some studies. Firstly, we obtain a significant difference between values of the primary metric (Average Precision) with 22.8% (ROC AUC 83.7%, F1 17.8%) for the Survival Ensemble versus 15.1% (ROC AUC 84.9%, F1 21.4%) for the Baseline. Secondly, the performance of the Survival Ensemble is also confirmed during the ablation study. Thirdly, our method exceeds age baselines by a significant margin. Fourthly, in the blind retrospective out-of-time experiment, the proposed method is reliable in cancer patient detection (9 out of 100 selected). Such results exceed the estimates of medical screenings, e.g., the best Number Needed to Screen (9 out of 1000 screenings).


Artificial intelligence tool developed to predict risk of lung cancer

#artificialintelligence

Lung cancer is the leading cause of cancer death in the United States and around the world. Low-dose chest computed tomography (LDCT) is recommended to screen people between 50 and 80 years of age with a significant history of smoking, or who currently smoke. Lung cancer screening with LDCT has been shown to reduce death from lung cancer by up to 24 percent. But as rates of lung cancer climb among non-smokers, new strategies are needed to screen and accurately predict lung cancer risk across a wider population. A study led by investigators from the Mass General Cancer Center, a member of Mass General Brigham, in collaboration with researchers at the Massachusetts Institute of Technology (MIT), developed and tested an artificial intelligence tool known as Sybil.


The effect of variable labels on deep learning models trained to predict breast density

Squires, Steven, Harkness, Elaine F., Evans, D. Gareth, Astley, Susan M.

arXiv.org Artificial Intelligence

Purpose: High breast density is associated with reduced efficacy of mammographic screening and increased risk of developing breast cancer. Accurate and reliable automated density estimates can be used for direct risk prediction and passing density related information to further predictive models. Expert reader assessments of density show a strong relationship to cancer risk but also inter-reader variation. The effect of label variability on model performance is important when considering how to utilise automated methods for both research and clinical purposes. Methods: We utilise subsets of images with density labels to train a deep transfer learning model which is used to assess how label variability affects the mapping from representation to prediction. We then create two end-to-end deep learning models which allow us to investigate the effect of label variability on the model representation formed. Results: We show that the trained mappings from representations to labels are altered considerably by the variability of reader scores. Training on labels with distribution variation removed causes the Spearman rank correlation coefficients to rise from $0.751\pm0.002$ to either $0.815\pm0.006$ when averaging across readers or $0.844\pm0.002$ when averaging across images. However, when we train different models to investigate the representation effect we see little difference, with Spearman rank correlation coefficients of $0.846\pm0.006$ and $0.850\pm0.006$ showing no statistically significant difference in the quality of the model representation with regard to density prediction. Conclusions: We show that the mapping between representation and mammographic density prediction is significantly affected by label variability. However, the effect of the label variability on the model representation is limited.


An Artificial Intelligence Outlook for Colorectal Cancer Screening

Katrakazas, Panagiotis, Ballas, Aristotelis, Anisetti, Marco, Spais, Ilias

arXiv.org Artificial Intelligence

Colorectal cancer is the third most common tumor in men and the second in women, accounting for 10% of all tumors worldwide. It ranks second in cancer-related deaths with 9.4%, following lung cancer. The decrease in mortality rate documented over the last 20 years has shown signs of slowing down since 2017, necessitating concentrated actions on specific measures that have exhibited considerable potential. As such, the technical foundation and research evidence for blood-derived protein markers have been set, pending comparative validation, clinical implementation and integration into an artificial intelligence enabled decision support framework that also considers knowledge on risk factors. The current paper aspires to constitute the driving force for creating change in colorectal cancer screening by reviewing existing medical practices through accessible and non-invasive risk estimation, employing a straightforward artificial intelligence outlook.


Artificial Intelligence Can See Breast Cancer Before It Happens

#artificialintelligence

The use of artificial intelligence (AI) and deep learning (DL) in the medical and healthcare field has been increasing at an astonishing rate. While the Health Insurance Portability and Accountability Act (HIPAA) is important for the protection of personal health information, it presented as the biggest barrier for gathering large data sets required for deep learning. Several strategies have been successfully implemented to gather lots of data for training medical AI systems without risking patient privacy. AI continues to have a significant impact on medical imaging and deep learning models are constantly being developed to look for anomalies such as bone fractures or possible cancer. The introduction of breast cancer screening has helped to reduce cancer mortality rates in women as well as provide a consistent source of image data.